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Abstract 

We propose a novel architecture to design a neural associative memory that is capable of learning a large number of patterns 
and recalling them later in presence of noise. It is based on dividing the neurons into local clusters and parallel plains, very 
similar to the architecture of the visual cortex of macaque brain. The common features of our proposed architecture with those 
of spatially-coupled codes enable us to show that the performance of such networks in eliminating noise is drastically better than 
the previous approaches while maintaining the ability of learning an exponentially large number of patterns. Previous work either 
failed in providing good performance during the recall phase or in offering large pattern retrieval (storage) capacities. We also 
present computational experiments that lend additional support to the theoretical analysis. 

I. Introduction 

While relying on iterative operations of simple (and sometimes faulty) neurons, our brain is capable of retrieving the correct 
"memory" with high degrees of reliability even when the cues are limited or inaccurate. Not surprisingly, designing artificial 
neural networks capable of accomplishing this task, called associative memory, has been a major point of interest in the 
neuroscience community for the past three decades. This problem, in its core, is in fact very similar to the reliable information 
transmission faced in communication systems where the goal is to find mechanisms to efficiently encode and decode a set of 
transmitted patterns over a noisy channel. More interestingly, the novel techniques employed to design good codes are extremely 
similar to those used in designing and analyzing neural networks. In both cases, graphical models, iterative algorithms, and 
message passing play a central role. 

Despite these similarities in the task and in the techniques, we witness a huge gap in terms of the efficiency achieved by 
them. More specifically, by using modern coding techniques, we are capable of reliably transmitting 2 rn binary vectors of 
length n over a noisy channel (0 < r < 1). This is achieved by intelligently introducing redundancy among the transmitted 
messages, that are later used to recover the correct pattern from the received noisy version. In contrast, until recently, artificial 
neural associative memories were only capable of memorizing O(n) binary patterns of length n (see, (T|, p), (3), pi). 

Part of the reasons for this gap goes back to the assumption held in the mainstream work on artificial associative memories 
which requires the network to memorize any set of randomly chosen binary patterns. While it gives the network a certain 
degree of versatility, it severely hinders the efficiency. 

To achieve an exponential scaling in the storage capacity of neural networks Kumar et al. (5) suggested a different viewpoint 
in which the network is no longer required to memorize any set of random patterns but only those that have some common 
structure, namely, patterns all belong to a subspace with dimension k < n. By assuming that the connectivity matrix of the 
neural graph is given, the authors proposed a simple iterative algorithm for the recall phase. However, they did not propose 
any algorithm for the learning phase, i.e., learning the connectivity matrix. This task was then accomplished in (16). Although 
the proposed approaches are capable of achieving exponential scaling for the pattern retrieval capacity, the performance of the 
algorithms employed in the recall phase are still far from being desirable. 

In this work, we follow the same viewpoint as in (5) where we only focus on memorizing a set of patterns with some degree 
of redundancy. However, we propose a different neural architecture to capture local correlations among patterns, similar to the 
way that the receptive field in human visual cortex is arranged. We use the algorithm proposed in | 16| for the learning phase. 
For the recall phase, we also employ the algorithm proposed in (TTJ as a building block. However, due to the novel structure, 
we will need a more powerful tool to investigate the performance of the recall algorithm. This is done by making use of the 
recent developments in the analysis spatially-coupled codes (14), (13). 

II. Related Work 

The first artificial neural associative memory was introduced in the pioneering work of Hopfield 1 1 ]. A "Hopfield network" 
is a complete graph of n neurons, equipped with the Hebbian learning rule (6) to memorize a set of randomly chosen binary 
patterns of length n. It was shown by McEliece et al. that the pattern retrieval capacity of Hopfield networks is C = (n/2 log(n)) 



Since then, there have been many attempts to increase the pattern retrieval capacity of neural associative memories. In 
particular, Venkatesh et al. |2] sacrificed the online learning capability to increase the capacity by calculating the weight matrix 
offline (T2| . The result is a pattern retrieval capacity of n/2 random patterns with the ability of one bit error correction. 
Jankowski et al. (3) investigated a different model with multi-state associative memory in which each neuron can be assigned a 



multivalued state from the set of complex numbers. It was shown by Muezzinoglu et al. R) that the capacity of such networks 
can be increased to C = n at the cost of a prohibitive weight computation mechanism. 

More recently, a new perspective has been proposed with the aim of memorizing only those patterns that posses some 
degree of redundancy. In this framework, a tradeoff is being made between versatility (i.e., the capability of the network to 
memorize any set of random patterns) and the pattern retrieval capacity. Pioneering this frontier, Berrou and Grippon considered 
memorizing patterns based on Walsh-Hadamard codes [9 ]. While the suggested approach increases the capacity beyond n, the 
complexity of operations in the recall phase was prohibitive. Gripon and Berrou | 8 ] recently proposed another method based 
on neural clicks which increases the pattern retrieval capacity potentially to 0(n 2 / log (n)) with a low complexity algorithm 
in the recall phase. The proposed approach is based on memorizing a set of patterns mapped from randomly chosen binary 
vectors of length k = 0(log(n)) to the n-dimensional space. This way, one could benefit from the extra redundancy within the 
patterns to increase the capacity as well as correcting errors, much in the same way error correcting codes do in communication 
systems. 

By considering patterns that come from a subspace with dimension k < n, Kumar et al. were able to show an exponential 
scaling in the pattern retrieval capacity, i.e., C = 0(a n ), with some a > 1. This model was later extended to modular patterns, 
i.e., those in which patterns are divided into non-overlapping sub-patterns where each sub-pattern come from a subspace (TTJ, 
1 16]. The authors have proposed a simple iterative learning algorithm as well as achieving an improved performance in the 
recall phase as compared to (5). 

In this paper, we follow the same line of work by extending the model proposed in (TTJ, |T6| to neural networks with 
modular structure in three dimensions, similar to the way visual cortex of the macaque brain is organized (T7). The proposed 
model is based on overlapping local clusters with neighboring neurons, where clusters are arranged in parallel planes. At 
the same time, there are sparse connections between various clusters in different planes. The aim is to memorize only those 
patterns for which local sub-patterns in the domain of each cluster show a certain degree of redundancy. 

Interestingly, this model is very similar to spatially-coupled codes on graphs [ 1 3 1 . This similarity helps us borrow analytical 
tools developed for analysing such codes p4| and investigate the performance of our proposed error correcting algorithm. 
Specifically, our suggested model is closely related to the spatially-coupled Generalized LDPC code (GLDPC) with Hard 
Decision Decoding (HDD) proposed in fT5) . In our model, the clusters can be regarded as "component codes" in the GLDPC 
ensemble and the parallel planes as spatial-coupling of the individual decoders. However, in contrast to (15), our proposed error 
correction algorithm is purely based on simple operations performed by neurons, and message passing even within clusters 
(i.e., component codes). Furthermore, due to the structure of neural networks, a neuron sends out the same message to all of 
its neighbour as it cannot differentiate among them. This is different from the type of message passing algorithms (e.g., belief 
propagation) performed on LDPC codes. 

III. Problem Setting and Notations 

Let X denote a dataset of C patterns of length n. In this paper, we assume that patterns are integer- valued with entries 
in {0,... ,5 — 1}. A natural way of interpreting this assumption is to consider the entries as the short-term firing rate of 
corresponding neurons. We divide each pattern into L sub-patterns of the same size and call them planes. Within each plane, 
we further divide the patterns into D overlapping clusters, i.e., an entry in a pattern can lie in the domain of multiple clusters 
(within the same or other planes). We also assume that each pattern neuron in plane I is connected to at least one cluster in 
planes t — + (except at the boundaries). Therefore, each pattern neuron is connected to 2Vt + 1 planes, on average. 

Finally, to introduce redundancy, we assume that entries of each cluster belong to a subspace (or more generally, having 
negligible minor components). 

Learning phase: This phase corresponds to learning the minor components or dual vectors for each cluster. In this paper, 
we use the learning algorithm proposed in (l6j. The output of the learning algorithm is a matrix for cluster d in plane 

L The rows of this matrix correspond to the dual vectors and the columns correspond to the pattern nodes. Therefore, by 
letting x^' d ) denote the sub-pattern corresponding to the domain of cluster d of plane £, we have 

W (i,d) . x (t,d) = 0. (1) 

By treating these matrices as connectivity matrices of the neural graph, one can consider each cluster as a bipartite graph 
composed of pattern and constraint neurons. The constraint neurons do not have any overlap (i.e. each one belongs only to 
one cluster) whereas the pattern neurons can have connections to multiple clusters. To ensure good error correction capabilities 
we aim to keep these connections sparse. 

Putting all the local connectivity matrices together, we obtain the model shown in Figure [T] Interestingly, this model is very 
similar to the neural architecture to process visual signals in macaque brain (17). As mentioned earlier, we have L planes 
with n/L pattern neurons. Each plane contains D clusters where cluster d in plane i contains ra^ constraints neurons and is 
connected to n^d pattern neurons. Note that due to overlaps among clusters, we have Yld n ^4 > n/L. 



Fig. 1: A coupled neural associative memory. 




Fig. 2: A connectivity graph with neural planes and super nodes. It corresponds to plan 1 of Fig. |l| 

We also consider the overall connectivity graph of plane £, denoted by in which the constraint nodes in each cluster 

are compressed into one super node. Any pattern node that is connected to a given cluster is connected with an (unwedighted) 
edge to the corresponding super node. Figure [2] illustrates this graph for plane 1 in Figure [T] 

For graph let A-^ and p^p be the fraction of edges connected to pattern and constraint nodes with degree i and j, 

respectively. We define the degree distribution polynomials in plane £ from an edge perspective as \^\x) = J2i A-^£* -1 and 
P^(x) = Z j p¥ ) x j - 1 - 

Recall phase: This phase corresponds to retrieving correct memorized patterns in response to noisy queries. At this point, 
the neural graph has been learned (fixed) and we are looking for a simple iterative algorithm to eliminate noise from the query. 
In this paper, we assume that the the noise is additive, i.e., the query is one of the memorized patterns plus some noise. The 
noise itself is integer- valued and for simplicity we assume that its entries are { — 1,0, +1}, where a —1 (resp. +1) corresponds 
to a neuron skipping a spike (resp. fire one more spike) than expected. More specifically, if the initial error probability is 
denoted by p e , then each entry of the noise vector is +1 or —1 with probability p e /2. In this paper, we propose an algorithm 
to eliminate this input noise. 

Pattern Retrieval Capacity: Finally, an ideal neural associative memory should have a large pattern retrieval capacity. This 
is the maximum number of patterns that can be memorized by a network while still being able to return reliable responses in 
the recall phase. 

IV. Main Results 

In this section, we briefly discuss our approach for addressing the aforementioned phases. 

Learning phase: As we mentioned earlier, for the learning phase we use the learning algorithm proposed in (16). In the 
remaining, we effectively assume that the connectivity matrices are known and satisfy Eq. [IK 

Recall phase: The proposed recall algorithm in this paper is the extension of the one proposed injfTTJ to the coupled neural 
networks. For the sake of completeness, we have summarized this approach in Algorithm [T] the goal of which is to correct 
(at least) a single error in a cluster with high probability. Let us remind ourselves of the performance of Algorithm JT] 

Theorem 1: fll) When ip —> 1, Algorithm [l] can correct a single error in each cluster with probability at least 1 — (d/m) dmin , 
where d and d m [ n are the average and minimum degree of the pattern nodes within the cluster domain. 

We apply Algorithm [T] sequentially to neural clusters and neural planes as follows. We start with the first cluster in the 
first plane and apply Algorithm [T] Once finished, we check the final state of the constraint nodes, expecting them to be all 



Algorithm 1 Error Correction Within Cluster (TTJ 



Input: Connectivity matrix W^ £,d \ threshold <p, iteration £ max . 
Output: Correct memorized sub-pattern x^ i,d \ 
1: for t = 1 t max do 

2: Forward iteration: Calculate the weighted input sum hi = Y^j=i W^'^ x^' d \ for each neuron yf' d ^ and set: 



1, hi<0 

yf 4) = { o, h t = o 

- 1 , otherwise 



Backward iteration: Each neuron x^'^ computes 



U,d) 



sp™e,d w (£,d) U,d) 



ETJi d \w t f d) \ 

4: Update the state of each pattern neuron j according to x^^ = x^^ + sgn(^' d ^) only if > <£. 

5: end for 



Algorithm 2 Error Correction of the Coupled Network 

Input: Connectivity matrix (W^ i,d \ W, Vd), iteration t max 
Output: Correct memorized pattern x = [#i, #2, . . . , x n ] 

1: for t = 1 t max do 

2: for ^ = 1 L do 

3: for d = 1 ^ Di do 

4: Apply Algorithm [T] to cluster d of neural plane £. 

5: Update the value of pattern nodes only if all the constraints in the clustered are satisfied. 

6: end for 
7: end for 
8: end for 



zero. If not, we revert the values of the pattern nodes to their initial values. We then proceed to the second cluster in the first 
plane and continue until we reach the last cluster in the last plane. We repeat the above sequential scheduling t max times (see 
Algorithm [2]). Repeating is particularly helpful since errors get corrected in each round which makes noisy clusters experience 
less amount of noise for the following rounds and thus higher success rate for the application of Algorithm [T] on the clusters. 
Note that by Theorem [T] Algorithm [T] can only guarantee the correction of a single error. Hence, applying Algorithm [T] to 
clusters with two or more errors might be unsuccessful. By sweeping t max times over the clusters and planes, we try to correct 
noisy neurons one by one. 

We consider two variants of the above error correction algorithm. In the first one, called constrained coupled neural error 
correction, we provide the network with some side information during the recall phase. This is equivalent to "freezing" a few 
of the pattern neurons to known correct states, similar to spatially-coupled codes |T3| , |T4| . In the case of neural associative 
memory, the side information can come from the context. For instance, when trying to fill in the blank in the sentence "The 
_ at flies", we can use the side information (flying) to guess the correct answer among multiple choices. Without this side 
information, we cannot tell if _ at corresponds to bat or cat. 

In the other variant, called unconstrained coupled neural error correction, we perform the error correction without providing 
any side information. This is similar to many standard recall algorithms in neural networks and serves as a benchmark to 
compare our method with those of other work (5|, (TTJ. 

Let z^\t) denote the average probability of error for pattern nodes across neural plane £ in iteration t. A cluster node in 
plane £ receives noisy messages from its neighbors with an average probability of where 

z {£) = E s.t. = 0, V/ $ {1, . . . , L}. 

j=-Q 

In the following, we derive a recursive expression for z^\t) (all the proofs are provided in the Appendix). 



(2) 



Lemma 2: Let us define g(z) = 1 — p(l — z) + zp'(l — z) and f(z;p e ) = p e \(z) . Then, 

^ ) (*+ i )=/(^Vi e^)4 

The decoding will be successful if z^\t + 1) < W. As a result, we look for the maximum p e such that 

Let and p| denote the maximum success probability for the constrained and unconstrained coupled system, respectively. We 
will use the analytical approaches recently proposed in p4| to find these thresholds. To this end, a potential function is defined 
to track the evolution of the error probability in Eq. [2] Let f(z;p e ) : R n — )> R n and g(z) : R n —> R n be two component- wise 
vector functions operating on a vector z such that [f(z;p e )]^ = f{zi\p e ) and [g(z)]$ = g(^), where f{zi\p e ) and g(^) are 
defined in Lemma [2] . Using these definitions, we can rewrite Eq. [2] in the vector form as fT4| : 

x{t+l) = A T i{Ag{z{t));p e ) (3) 

where A is the coupling matrix defined a^J 
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At this point, the potential function of the unconstrained coupled system could be defined as |T4| : 

U(z; Pe ) = ( g>)(u-A T f(Ag(u)).du 

Jc 

= g(z) T z-G(z)-F(Ag(z);p e ) (4) 

where g'(z) = diag([^(^)]), G(z) = / c g(u) • du and F(z) = / c f(u) • du. 

The potential function is defined in the way that min{/7(z;p e )} > for p e < p* e . In contrast, for the unconstrained coupled 
system, we have U f (z;p e ) > for p e < p\. In other words, in order to find p*, it is sufficient to find the maximum p e such that 
mm{U(z;p e )} > (l4). We will use this fact and compare the two thresholds later in the simulations section. Intuitively, we 
expect to have p\ < (side information only helps), and as a result a better error correction performance for the constrained 
system. 

The next theorem shows that for p e < p\, the error probability of the unconstrained coupled system goes to zero. 

Theorem 3: For p e < p\, the fixed point of Eq. [2] is and is achieved by iterative updates of Algorithm^ 

Similarly, to address the performance of the constrained coupled system, we use the results of [|14j and p3| . 

Theorem 4: For the constrained coupled neural associative memory, when p e < p* the potential function decreases in each 
iteration. Furthermore, if L > ||i7 // (z;p e )|| 00 /A£ , (p e ), the only fixed point of Eq. [3] is 0. 

Pattern retrieval capacity: In this part, we prove that the number of patterns that can be memorize by the proposed scheme 
is exponential in n, the pattern size. First, note that this number is only a function of the size of the subspace that sub-patterns 
come from. In other words, the number of patterns C which the network memorizes does not depend on the learning or recall 
algorithms (except for its obvious effect on the running time). Therefore, in order to prove that C could exponentially scale 
with n, we show that there exists a subspace with exponentially large number of members (in terms of n). The following 
theorem is the extension of the one we proved in fTT| , fT6| to coupled neural associative networks. 

Theorem 5: Let X be the C x n dataset matrix, formed by C vectors of length n with entries from the set S. Let also k = rn 
for some < r < 1. Then, there exists a set of patterns for which C = a rn , with a > 1, and rank(<Y) = k < n. 



Matrix A corresponds to the unconstrained system. A similar matrix can be defined for the constrained case. 




Initial bit error probability 

Fig. 3: The final pattern error probability for the constrained and unconstrained coupled neural systems. 

V. Simulations 

In this paper, we are mainly interested in the performance of the recall phase and demonstrate a way, by the means of 
spatial coupling, to improve upon the previous art. To this end, we assume that the learning phase is done (by using our 
proposed algorithm in (16)) and we have the weighted connectivity graphs available. For the ease of presentation, we can 
simply produce these matrices by generating sparse random bipartite graphs and assign random weights to the connections. 
Given the weight matrices and the fact that they are orthogonal to the sub-patterns, we can assume w.l.o.g that in the recall 
phase we are interested in recalling the all-zero pattern from its noisy version. 

We treat the patterns in the database as 2D images of size 64 x 64. More precisely, we have generated a random network 
with 29 planes and 29 clusters within each plane (i.e., L = D = 29). Each local cluster is composed of 8 x 8 neurons and 
each pattern neuron (pixel) is connected to 2 consecutive planes and 2 clusters within each plane (except at the boundaries). 
This is achieved by moving the 8x8 rectangular window over the 2D pattern horizontally and vertically. 

We investigated the performance of the recall phase by randomly generating a 2D noise pattern in which each entry is set 
to ±1 with probability p e /2 and with probability 1 — p e . We then apply Algorithm [2] to eliminate the noise. Once finished, 
we declare failure if the output of the algorithm, x, is not equal to the pattern x (assumed to be the all-zero vector). 

Figure [3] illustrates the final error rate of the proposed algorithm, for the constrained and uncosntrained system. To obtain 
the final recall error probability, we have repeated Algorithm [2] for t max = 10 times. For the constrained system, we fixed 
the state of a patch of neuron on size 3 x 3 at the four corners of the 2D pattern. In the same figure, the results are also 
compared to the similar algorithms in (5) and (TTJ. In (5j, there are no clustering while in (TTJ the network is divided into 
some non-overlapping clusters with a second nerual level to compensate for the lack of collaboration among non-overlapping 
clusters. As obvious from the figure, the performance of the proposed algorithms in this paper is significantly better than the 
ones in (5) and (TTJ. 

VI. Conclusions 

In this paper, we proposed a novel architecture for neural associative memories. The proposed model comprises a set of neural 
planes with sparsely connected overlapping clusters. Given the similarity of the suggested framework to spatially-coupled codes, 
we employed recent developments in analyzing these codes to investigate the performance of our proposed neural algorithm. 
We also presented numerical simulations that lend additional support to the theoretical analysis. 
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Appendix 

A. Proof of Lemma^ 

Let z^\t) denote the average probability of error for pattern nodes across neural plane i and in iteration t. Furthermore, 
let ttW(£) be the average probability of a custer neuron in plane t sending an erroneous message to its neighbors. We will 
derive recursive expressions for z^\t) and n^\t). 

A cluster node in plane i receives noisy messages from its neighbors with an average probability of z^\ where 

20 

with = for i < and i > L. 

(£) 

Let 7T- denote the the probability that a cluster node with degree i in plane i sends an erroneous message to its neighboring 
pattern nodes. Then, knowing that each cluster is capable of correcting at least one error, ir\ ' is equal to the probability of 
receiving two or more noisy messages from pattern neurons, 



E W - 1 V z^~^ 



^ = 1 



Now, letting n^\t) denote the average probability of sending erroneous nodes by cluster nodes in plane £ and in iteration t, 
we will have 

*M(t) = £{*<<>} 

= E^f 

i 

= l-p(l-zM(t)) + zM(t)f/(l-zW(t)), 

where p(z) = ^2 i piZ l is the cluster node degree distribution polynomial and p'{z) = dp(z)/dz. 
To simplify notations, let us define the function g(z) = 1 — p(l — z) + zp'(l — z) such that 

Now consider a given pattern neuron with degree j in plane t. In iteration t + 1, Let z^\t) denote the probability of sending 

an erroneous message by this node. Then, z[p (t) is equal to the probability of this node being noisy in the first place (p e ) 
and having all its cluster nodes sending erroneous messages in iteration t, the average probability of which is 

ft 



Now, since z^(t + 1) = E{zf(t + 1)}, we get 

>(t+i) = Pe^ftoy 



3 



Again to simplify the notation, let us define the function f(z;p e ) = p e X(z). This way, we will have the recursion as: 

Q 



z it) 



(t + 1) = /( ^Tl S a&'-Ht^Pe). 

i=-Q 



B. Proof of Theorem [J] 

The proof is straightforward and results from the defintion of the potential. Because of the defintion of p\, we know that 
for p e < pi we have 

U'fap e )>0. 

From equation ([4]), we have 

f/'(z;p e ) = g'(z)(z-^ T f(^g(z)). 
Given that g'(z) > 0, from U'(z;p e ) > we conclude that 

z - A T f(Ag(z) > 0. 

This is equivalent to saying that 

zW(t+l) = A T f(Ag(z^(t)) < z W(t). 

Therefore, 

C. Proof of Theorem [?] 

The proof of the theorem relies on results from fl3| to show that the entries in the vector z(t) 
non-decreasing, i.e., 

zM(t) <z (2) (t) 

This can be shown using induction and the the functions f(-,p e ) and g(-) are non-decreasing (see the proof of Lemma 22 in 
p3| for more details). 

Then, one can apply the result of Lemma 3 in JT4) to show that the potential function of the constrained coupled system 
decreases in each iteration. Finally, when 

L>\\U"(z;p e )\\oo/AE(p e ) 
one could apply Theorem 1 of |T4| to show the convergence of the probability of errors to zero. 

D. Proof of Theorem^ 

The proof is based on construction: we construct a data set X with the required properties such that it can be memorized 
by the proposed neural network. To simplify the notations, we assume all the clusters have the same number of pattern and 
constraint neurons, denote by h c and rh c . In other words, rig^ = h c and rri£^ = m c for all £ = {1, . . . , L}and d = {1, . . . , D}. 

We start by considering a matrix G G M /exn , with non-negtaive integer- valued entries between and 7 — 1 for some 7 > 2. 
We also assume k = rn, with < r < 1. 

To construct the database, we first divide the columns of G into L sets, each corresponding to the neurons in one plain. 
Furthermore, in order to ensure that all the sub-patterns within each cluster for a supspace with dimension less than h c , 
we propose the following structure for the generator matrix G. This strucutre ensures that the rank of any sub-matrix of G 
composed of h c columns is less than n c . In the matrices below, the hatched blocks represent parts of the matrix with some 

non-zero entries. To simplfiy visualization, let us first define the sub-matrix G as the building blocks of G\ 

n/(D-L) 



= [zW(t),...,zW(t)] are 




Then, G is structured as 



where each hatched block represents a random realization of G. 

Now consider a random vector u e R k with integer- valued-entries between and v — 1, where v > 2. We construct the 
dataset by assigning the pattern x G to be x = u • G, if all the entries of x are between and 5 — 1. Obviously, since both 
u and G have only non-negative entries, all entries in x are non-negative. Therefore, it is the S — 1 upper bound that we have 
to worry about. 

Let Qj denote the j th column of G. Then the j th entry in x is equal to x 3 ; = u • Qj. Suppose Qj has dj non-zero elements. 
Then, we have: 

Xj =U-Qj < djd- 1)0 - 1) 
Therefore, letting d* — maxj dj , we could choose 7, v and d* such that 

S- 1 > d*(7- 1)0 - 1) (5) 

to ensure all entries of x are less than S. 

As a result, since there are v k vectors u with integer entries between and v — 1, we will have u fe = ^ rn patterns forming 
A\ Which means C = v rn , which would be an exponential number in n if v > 2. 

References 

[1] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci., Vol. 79, 1982, pp. 2554-2558. 
[2] S. S. Venkatesh, D. Psaltis, Linear and logarithmic capacities in associative neural networks, IEEE Trans. Inf. Theory, Vol. 35, No. 3, 1989, pp. 558-568. 
[3] S. Jankowski, A. Lozowski, J.M., Zurada, Complex-valued multistate neural associative memory, IEEE Tran. Neur. Net., Vol. 1 , No. 6, 1996, pp. 
1491-1496. 

[4] M. K. Muezzinoglu, C. Guzelis, J. M. Zurada, A new design method for the complex-valued multistate Hopfield associative memory, IEEE Trans. Neur. 

Net., Vol. 14, No. 4, 2003, pp. 891-899. 
[5] K.R. Kumar, A.H. Salavati and A. Shokrollahi, Exponential pattern retrieval capacity with non-binary associative memory, Proc. IEEE Information 

Theory Workshop, 2011. 
[6] D. O. Hebb, The organization of behavior, New York: Wiley & Sons, 1949. 

[7] R. McEliece, E. Posner, E. Rodemich, S. Venkatesh, The capacity of the Hopfield associative memory, IEEE Trans. Inf. Theory, Jul. 1987. 

[8] V. Gripon, C. Berrou, Sparse neural networks with large learning diversity, IEEE Trans, on Neural Networks, Vol. 22, No. 7, 2011, pp. 10871096. 

[9] C. Berrou, V. Gripon, Coded Hopfield Networks, Proc. Symp. on Turbo Codes and Iterative Information Processing, pp. 15, 2010. 
[10] P. Peretto, J. J. Niez, Long term memory storage capacity of multiconnected neural networks, Biological Cybernetics, Vol. 54, No. 1, 1986, pp. 53-63. 
[11] A. H. Salavati, A. Karbasi, Multi-Level Error-Resilient Neural Networks, IEEE Int. Symp. Inf. Theory (ISIT 2012), 2012. 
[12] J. Hertz, A. Krogh, R. G. Palmer, Introduction to the theory of neural computation, USA: Addison- Wesley, 1991. 

[13] S. Kudekar, T. Richardson, R. Urbanke, Threshold saturation via spatial coupling: why convolutional LDPC ensembles perform so well over the BEC, 

IEEE Trans. Inf. Theory, Vol. 57, No. 2, 2012, pp. 803-834. 
[14] A. Yedla, Y. Jian, P. S. Nguyen, H. D. Pfister, A simple proof of threshold saturation for coupled scalar recursions, To appear in Int. Symp. Turbo codes 

and Itr. Info. Processing (ISTC), 2012 

[15] Y. Jian, H. D. Pfister, K. R. Narayanan, Approaching capacity at high rates with iterative hard-decision decoding, Int. Symp. Inf. Theory (ISIT), 2012. 
[16] A. Karbasi, A. H. Salavati, A. Shokrollahi, Iterative Learning and Denoising in Neural Associative Memories, To appear in ICML 2013. 
[17] D. S. Modha, R. Ananthanarayanan, S. K. Esser, A. Ndirango, A. J. Sherbondy, R. Singh "Cognitive computing," Communications of the ACM, Vol. 
54, No. 8, 2011, pp. 62-71. 

[18] P. Dayan, L. F. Abbott, Theoretical neuroscience: computational and mathematical modeling of neural systems, MIT Press, 2004. 



